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Abstract 



We review basic ideas and recent developments on the determination of the parton substructure of the 
nucleon, in view of applications to precision hadron collider physics. We review the way information 
on PDFs is extracted from the data exploiting QCD factorization, and discuss the current main two 
approaches to parton determination (Hessian and Monte Carlo) and their use in conjunction with 
different kinds of parton parametrization. We summarize the way different physical processes can be 
used to constrain different aspects of PDFs. We discuss the meaning, determination and use of parton 
uncertainties. We briefly summarize the current state of the art on PDFs for LHC physics. 
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1 QCD in the LHC era 



The theory and phenomenology of the strong interactions pQ have witnessed an impressive development 
in the last two decades, driven first by the availability of HERA [2] — a QCD machine — and then 
by the needs of present (Tevatron) and especially upcoming (LHC) hadron colliders [3]. The LHC will 
be looking for new physics in hadronic collisions. 

The last time this happened was back in the early eighties, when the W and Z were discovered at 
the SPS collider [5j — and, of course, one may argue to which extent the W and Z then were genuinely 
physics. At the time, QCD was at best a semi-quantitative theory: for example, in Ref. [5] a 



' new 



measured W cross section of 0.63±0.10 nb (at yfs = 630 GeV) was described as "in agreement with the 
theoretical expectation" [6] of 0.47 j^o! n b- One reason why at that time a NLO calculation couldn't 
be expected to agree with the data to better than 20% is that the knowledge of nucleon structure was 
at the time extremely sketchy: a parton set consisted of three parton distributions (valence, quark 
sea, and gluon), differences at the 30% level between sets would be standard, and, of course, there 
would be no idea on the associate uncertainty (see Fig. []]). 
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Figure 1: Comparison of two parton distribution sets 0[8] from the early eighties (From Ref. [9]). 



The evolution in time of parton distributions (see Fig. ([2])) since then shows that it is only during 
the HERA age that predictions from different groups converged: this is both a consequence and a cause 
of the fact that perturbative QCD has now turned into a quantitative theory, which leads to predictions 
for hard processes with typical accuracies below 10%, and often of a few percent. Perturbative QCD 
today is an integral part of the Standard Model, and it is tested to an accuracy which is comparable to 
that of the electroweak sector: in fact, HERA has played for QCD a similar role as LEP for electroweak 
theory. In the last decade, theoretical and phenomenological progress has been impressive: at the LHC 
we can envisage quantitative control of QCD contribution to collider signal and background processes 
at the percent level, as will be necessary for discovery at the LHC [3]. 

Progress in QCD has taken place in (at least) five distinct directions, namely (listing from the 
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Figure 2: Historical evolution of the up (left) and down (right) quark distributions (from Ref. [10J). 



bottom beam nucleons up to the final state): First, the understanding of the structure of the nucleon 
in terms of parton distributions has now become a quantitative science. Second, perturbative compu- 
tations are being pushed to hard processes with increasingly high numbers of particles and at increas- 
ingly high orders, thanks to the development of a variety of techniques which include twistor methods, 
analyticity techniques, and the use of exact results from supersymmetric QCD and the AdS/CFT 
duality. Third, all-order resummation of perturbation theory is being extended in various kinematic 
regimes (small x and large x) to new classes of observables (typically less inclusive), to higher loga- 
rithmic orders, and it is being accomplished using perturbative renormalization-group methods, path 
integral techniques, and effective field theory methods. Fourth, definitions of jet observables which 
are both consistent with perturbative factorization to all orders and numerically efficient have been 
constructed theoretically and implemented in computer interfaces. Fifth, new collinear subtraction 
algorithms have been developed which make the development and implementation of next-to-leading 
order Monte Carlo codes possible. 

The lectures at the Zakopane school on which this paper is based, ambitiously entitled "QCD at the 
dawn of the LHC", covered the first three of these topics: parton distributions (PDFs), perturbative 
computations, and resummation. Here we will concentrate on PDFs; recent good reviews of progress 
in perturbative computations are in Refs. [Ill I12j . while a comprehensive overview of resummation 
is unfortunately not available yet. At Zakopane, jets and Monte Carlos where discussed by other 
speakers; excellent recent reviews of these topics are in Refs. |13t [T4"] respectively. 

The purpose of this overview of PDFs is both to provide an elementary introduction to the subject, 
and also a summary of recent developments, several of which are little known outside a small group 
of practitioners. Progress in this field has been largely driven by two series of HERA-LHC workshops 
2004-1005 and 2006-2007, which have organized and stimulated the transfer of know-how from deep- 
inelastic scattering to hadron collider physics, and whose results are collected in the respective reports 
Refs. |15l 116]. Since 2007, the PDF4LHC working group has been formed [T7] with a mandate from 
the CERN directorate to steer and coordinate research on PDFs for the LHC community: many of 
the more recent ideas discussed here were developed in the context of this working group. 

This review is organized as follows. We will start with the more basic concepts, then work our 
way to somewhat more advanced developments. First, we will very briefly review some basic (mostly 
kinematic) facts on QCD factorization. We will then present the two main existing approaches (Hessian 
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and Monte Carlo) to the determination of PDFs and the way they are used in conjunction with various 
forms of parton parametrization. Next, we will review standard ideas on how information on PDFs 
can be extracted from the data. We will then discuss in some detail the problem of PDF uncertainties 
- what they mean and how they are determined. In the final section we will briefly summarize the 
state of the art: the role of theoretical uncertainties, and the current understanding of standard candle 
processes at the LHC. 

These lectures are dedicated to the memory of Wu-Ki Tung, who pioneered this field, pursued it 
for more than 30 years, and shaped much of our current understanding of it. 

2 Factorization 

Factorization of cross sections into hard (partonic) cross sections and universal parton distributions 
is the basic property of QCD which makes it predictive in the perturbative regime, and which enables 
a determination of parton distributions. Here we only review some basic facts which will be useful 
for our subsequent discussion, while referring to standard textbooks [18] and recent reviews [1] for a 
detailed treatment. 

2.1 Electro- and hadro-production kinematics 

The basic factorization for hadroproduction processes has the structure 

a x (s,M x ) = ^2 dx 1 dx 2 f a / hl (x 1 ,Mx)f b/h2 (x2,Mx)a ab ^x{xix 2 s,Mx) 

^E/ 1 — r —fa /h A^Mx)f b/h ^x 2 ,Mx)c(—,a s (Mx)) (1) 

n b Jt X\ J T / Xl X 2 \X\X 2 J 

-£(x)c(^,a s (M x )), 



a.b 

1 



where f a /hi( x i) is the distribution of partons of type a in the i-th incoming hadron; a qaqb ^x is the 
parton- level cross section for the production of the desired (typically inclusive) final state X; the 
minimum value of Xi is x m i n = r, with 

(2) 

s 

the scaling variable of the hadronic process, and in the last step we defined the parton luminosity 

£0=) = f 1 -fa/ hl {^Ml)f b/h J-,M x )= f-f a/h2 (z,Ml)f b/h (*,Mx). (3) 

J x Z \Z / J x Z \Z ' 



The hard coefficient function C (r, a s (M x )) is defined by viewing the parton-level cross section as 
s of the partonic subprocess: 



a function of the hard scale M x and the dimensionless ratio of this scale to the center-of-mass energy 



(4) 

S X\X 2 

in terms of the scaling variable. At the lowest order in the strong interaction, the partonic cross section 
is then either zero (for partons that do not couple to the given final state at leading order), or else 
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just a function fixed by dimensional analysis times a Dirac delta, and the hard coefficient function is 
thus defined as 

a ab ^ x {s,M x ) = a C ab (r,a s {M 2 x )) ; C ab (x,a s (M x )) = c ab 5(l - x) + 0{a 8 ), (5) 

where c ab is a matrix with non-vanishing entries only between quark and antiquark states, which will 
be discussed explicitly in Sect. 12.21 below (see in particular Eqs. (jl8M19p . For example, for virtual 
photon (Drell-Yan) production c ab is nonzero when ab is a pair of a quark and an antiquark of the 
same flavour, and in such case <jq = %' Ka \- 

The factorized result Eq. ([2]) holds both for inclusive cross sections and for rapidity distributions.: 

^(r, Y, Ml) _ £ £ to £ to fi^MDl^Ml) *jL. (-L-,y, M ) , (6) 

where the hadronic cross section is differential with respect to the rapidity Y of the final state X, 
while the partonic cross section is differential in partonic rapidity 

, = y-i ta |= (7) 

the effect of parton emission from the incoming hadrons is to perform a Lorentz boost from the 
hadronic center-of-mass frame to a frame in which the energy of each of the two incoming hadrons are 
rescaled by x\ and x 2 respectively. The lower limit of integration x m { n is then fixed by requiring that 
the rapidity of the incoming partons be at least sufficient to yield the observed final state rapidity: 

x = x = ^ e ~Y_ (8) 

At leading order, the two partons couple directly to the final state so y = and 

*io = >|. (9) 

Equation (j2J) is to be contrasted with the standard factorization for the deep-inelastic structure 
functions Fi(x,Q 2 ): 

F t (x,Q 2 ) = xY, f -a(-,« s (Q 2 ))/i(^Q 2 ). (10) 
• J x z \ z J 

Here in the argument of the structure function x = is the standard Bjorken variable and the hard 
coefficient function is the structure function computed with an incoming parton and fi(z,Q 2 ) is the 
distribution of the i-th parton in the only incoming hadron. Also in this case at lowest 0(a®) it is 
either zero (for incoming gluons) or a constant (an electroweak charge) times a Dirac delta. 

Note that the structure functions are related to the cross section which is actually measured in 
lepton-hadron scattering by 

d -^aW {x ^ Q2) = [Y + F 2 NC {x,Q 2 )^Y.xF^(x,Q 2 )-y 2 F^{x,Q 2 )] , (11) 

for neutral-current charged lepton £^ DIS, where the longitudinal structure function is defined as 

F L {x,Q 2 ) = F 2 {x,Q 2 ) - 2xF l {x,Q 2 ), (12) 
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proton - (anti)proton cross sections 




Vs (TeV) 



Figure 3: LHC kinematics (left) and processes (right) (from Ref. |15j). 



and 

Y± = l±(l-y) 2 , (13) 
in terms of the electron momentum fraction 

V= — r = — , (14) 
p ■ k xs 

(not to be confused with the partonic rapidity Eq. ([7])) where p and k are respectively the incoming 
proton and lepton momenta, q is the virtual photon momentum (q 2 = —Q 2 ) and in the last step, 
which holds neglecting the proton mass, s is the center-of-mass energy of the lepton-proton collision. 
Similar expressions hold for charged-current charged and neutral lepton scattering. 

The set of values of y over which the PDF is probed is of course the same in the hadro- and 
leptoproduction cases, and it ranges from the scaling variable of the hadronic process to one: x < y < 1 
in Eq. (|10p . and r < x±, x% < 1 in Eq. ([2]). The kinematic region which is typical of of collider (HERA) 
or fixed-target DIS experiments is compared in Fig. [3] to that of LHC processes, whose typical cross 
sections are also shown. 

There is an important kinematic difference when comparing the hadronic and deep-inelastic fac- 
torization formulae, Eqs. ([2]) and (fTUj) respectively. This is related to the fact that the leading order 
coefficient function is proportional to a Dirac delta. For DIS, this implies that at leading order, the 
value of the structure function at given x determines the quark distributions at the same value of x, 
and it is only at next-to-leading order, where the coefficient function has a nontrivial dependence on 
x, that the PDF is probed for all values x < y < 1. But for hadronic processes, because there are two 
partons in the initial state, even at leading order, for inclusive cross sections the delta kills one but 
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not both of the convolution integrals in Eq. ([2]), so all values r < x% < 1 are probed. However, for 
rapidity distributions because of the further kinematic constraint Eq. ([9]) the leading order kinematics 
is also fixed, and for given Y and M x the momentum fractions of both partons are fixed. 

2.2 Constraints on PDFs 

The kinematics of the factorized expressions Eqs. ([2]) and (jlOp immediately implies that, as discussed in 
Sect. 12.1] at leading order deep- inelastic structure functions and rapidity distributions provide a direct 
handle on individual quark and antiquark PDFs (DIS), or pairs of PDFs (Drell-Yan). It is possible to 
understand what is dominantly measured by each individual process by looking at the leading order 
expressions, bearing in mind that, of course, beyond leading order all other contributions turn on 
(and that NLO corrections can be quite large, in fact of the same order of magnitude as the LO for 
Drell-Yan). 

The leading order contributions to the DIS structure functions F\ and F% are the following (at 
leading order Fi = 2xF\): 

NC J Pi 7 = E l ^ 2 (% + %) 

NC Fx z ' int - = EiBi(qi + qi) 

NC F 3 z > int - = £<A(fc + fc) (15) 
CC F] v+ = u + d + s + c 

CC -F^ + /2 = u-d- s + c, 

where NC and CC denotes neutral or charged current scattering and we have lumped together the 
contributions coming from Z exchange and from jZ interference, with couplings given by 

B q (M 2 x ) = -2e q V e V q P z + (yf + A 2 ){V 2 + A 2 q )P 2 ; (16) 
D q (M 2 x ) = -2e q A e A q P z + AV t A,V q A q P 2 z (17) 

in terms of the electroweak couplings of quarks and leptons listed in Table [1] and the propagator 
correction P z = M X /(M X + M|). 



fermions 


e f 




Af 


u,c,t 


+2/3 


(+1/2 -4/3 sin 2 d w ) 


+1/2 


d,s,b 


-1/3 


(-1/2 + 2/3 sin 2 6 W ) 


-1/2 







+1/2 


+1/2 




-1 


(-1/2 + 2 sin 2 6 W ) 


-1/2 



Table 1: Electroweak couplings of fermions. 



The leading order contribution to Drell-Yan is given by 

w % = ^^T, iJ \v§ KU \L i KA^) (is) 

z ^ = ^^ Et{ v 2 + A 2 )L^, X ^ 
in terms of the differential leading order parton luminosity 

L ij (x h x 2 ) = q i (xi,Mj [ )q j (x2,M x ) + q i (x2,M x )q j (x 1 ,Mj [ ) (19) 



6 



Figure 4: Leading order diagrams for inclusive jet production from gluons 



and the CKM matrix elements Vij, with x® given by Eq. ([8]). This shows explicitly that, as already 
mentioned, for a rapidity distribution the leading order parton kinematics (i.e. the values of Xj) is 
completely fixed by the hadronic kinematics (i.e. the values of y and Mjf). 

Note that while at a pp collider (or when a p beam collides with a p fixed target) such as the LHC 
it makes no difference whether the incoming quark and antiquark come from either of the initial-state 
hadrons, at a pp collider such as the Tevatron (or when a p beam collides with a deuterium fixed 
target) there are two different contributions, according to whether each of the incoming partons is 
extracted from either of the initial-state hadrons. 

Leading-order information on the gluon can be extracted from jet production (see Fig. |4|), or from 
scaling violations, as measured for instance by the Q 2 dependence of deep- inelastic structure functions. 
The latter are coupled to the gluon even at leading order through the singlet QCD evolution equations, 
which in terms of Mellin moments 

fi(N,Q 2 )= f 1 dxx N - 1 f i (x,Q 2 ) (20) 
Jo 

of parton distributions take the form 

d_ (Z(N,Q 2 )\ <*s(t) (rf g (N,a s (t)) 2n n S g (N,a s (t))\^(m,Q 2 )\ m) 

dt \g(N,Q 2 )) 2vr ^(JV,a,(t)) J* g (N,a s (t)) J * \g(N,Q 2 ) J ' [Zl) 

J t «3 S (N,Q 2 ) = ^% S (N,a s (t))$ S (N,Q 2 ) (22) 

where the singlet combination of quark distributions is defined as 

n f 

£(s, Q 2 ) = Y, (ftfo Q 2 ) + Qi(x, Q 2 )) , (23) 
i=i 

and the remaining nonsinglet combinations can be taken as any linearly independent set of 2nt — 1 
differences of quark and antiquark distributions, qfj (N,Q 2 ) = q^ s (N,Q 2 ) — (N,Q 2 ) which all 
evolve according to individual, decoupled equations. 

The leading order anomalous dimensions are shown in Fig. [5j while at leading order all nonsinglet 
7^' s are equal to each other and are also equal to -y qq . The qualitative behaviour of perturbative 
evolution is then deduced recalling that Mellin transformation maps the large (small ) x — > 1 {x — > 0) 
region into the large (small) N — > oo (N —> 0) region. A first relevant feature is that as the scale 
increases all PDFs decrease at large x and increase at small x. A second important feature is that 
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because the gluon has the rightmost singularity at small N it drives small x scaling violations, and 
thus in particular at sufficiently small x and large Q 2 all PDFs have the same shape, driven by the 
gluon. Finally, the evolution of the gluon (driven by j gg ) is strongest at either large or small x but 
its coupling to the quark (driven by ^ qg ) is only large at small x, so it is only at not too large x that 
scaling violations provide leading constraints on the gluon. 

Finally, it is important to note that constraints on PDFs come from their cross-talk imposed by 
sum rules: specifically the conservation of baryon number 

I dx (u p (x, Q 2 ) - u p {x, Q 2 )) = 2 = 2 f dx (d p {x, Q 2 ) - d p {x, Q 2 )) (24) 
Jo Jo 

and the conservation of total energy-momentum 



dxx 



N f 



J2i<i l (x,Q 2 ) + Ux,Q 2 ))+g(x,Q 2 



i=l 



(25) 



Clearly, these sum rules provide constraints on the behaviour of parton distributions even in the region 
where there are no data. 



3 Statistics 

A determination of parton distributions is a determination of at least seven independent functions: 
three light quark and antiquark distributions and the gluon at some initial scale, from which PDFs 
at all other scales can be obtained solving evolution equations. More functions must be determined if 
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one wishes to keep open the possibility [19] that heavy quarks PDFs are at least in part of "intrinsic" 
nonperturbative origin, rather than being determined radiatively from gluons by QCD evolution. A 
determination of PDFs with uncertainties thus involves determining a probability distribution in a 
space of several independent functions. Because experimental data used for this determination will 
always be finite in number, this is in principle an ill-posed (or unsolvable) problem. 

The time- honored |20[|21j method to make this problem tractable is to assume a specific functional 
form for parton distributions, which projects the infinite dimensional problem onto a finite-dimensional 
parameter space. This method is justified because PDFs are expected to be smooth functions of the 
scaling variable x. Because < x < 1, a representation of these functions with finite accuracy must be 
possible on a finite basis of functions: hence, a representation of PDFs must be possible in terms of a 
finite number of parameters. The problem is then reduced to the choice of an optimal parametrization, 
namely, one that for given accuracy minimizes the number of parameters without introducing a bias. 
We will discuss below two such parametrizations. 

Whatever the parametrization, determining a set of PDFs involves computing a number of physical 
processes with a given set of input PDFs, and extremizing a suitable figure of merit, such as a x 2 or 
likelihood function in order to determine a best-fit set of PDFs. Existing sets of parton distributions 
which are made available for the computation of LHC processes through standard interfaces are 
determined and delivered following two main strategies: a "Hessian" approach, in which the best-fit 
result is given in the form of an optimal set of parameters and an error matrix centered on this optimal 
fit to compute uncertainties, and a Monte Carlo approach, in which the best fit is determined from 
the Monte Carlo sample by averaging and uncertainties are obtained as variances of the sample. 

It turns out that available Hessian PDF sets are mostly based on a "standard" parametrization, 
inspired by various QCD arguments. On the other hand, the only available full Monte Carlo PDF set 
is based on a rather different form of parametrization, which adopts neural networks as interpolating 
functions in an attempt to reduce the bias related to the choice of functional form [22]. However, 
Monte Carlo studies based on other standard [23] and non-standard [23] parametrizations have also 
been presented. 

Here we will summarize the main features of both the Hessian and Monte Carlo approach, and in 
each case also discuss the parton parametrization which is most commonly used with each approach, 
and the way the best fit is determined in each case — which in turn requires a peculiar algorithm 
within the neural network approach. 

3.1 Hessian uncertainties and the "standard" approach 

The standard approach to PDF determination is based on assuming for PDFs at some reference scale 
Qo a functional form inspired by counting rules [25], which suggest that PDFs should behave as 
fi(x) ~ (1 — x)f, and Regge theory, which suggest [26] that they should behave as fi(x) ~ x ai . 

Note that these limiting behaviours are necessarily approximate, because even if they hold at some 
scale, at any other scale perturbative evolution will correct them by logarithmic terms which behave 
as ln(l — x) as x — > 1 and as In x as x —> 1. Therefore, even if counting rules and Regge theory actually 
provide predictions for the values of the exponents and a, respectively (for given parton and parent 
hadron), they are taken as free fit parameters. 

Based on this, typically PDFs are assumed to have the form 




(26) 
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where gi(x) tends to a constant for both x — )■ and x — > 1. For instance, the CTEQ/TEA collaboration 
assumes generally [271 12H] 



xf(x, Qq) = aox ai (1 — x) Q2 exp (03a; + a^x 2 + a^\/x + clqx a? ) 



(27) 



with different parameters aj for each PDF, but some parameters fixed or set to zero for some PDFs 
- for example, parameters a§ and aj are nonzero only for the gluon distribution. Other groups 
assume that gi(x) is a polynomial in x or in y/x: for instance HERAPDF [29] assumes <?j(x) = 
1 + ei^fx + DiX + EiX 2 . 

Different choices are possible for the set of linearly independent combinations of PDFs for which 
the parametrization Eq. (]26p is adopted, and for the total number of free parameters to be used. 
For instance CTEQ/CT parametrizes the "valence" light combinations u v = u — u, d v = d — d, the 
antiquark distributions u and d, the two strangeness combinations s = s±s (but in the CTEQ6.6 [27] 
and CT10 [28] fits it is assumed that s - s = 0) and the gluon, with 22 (CTEQ6.6) or 26 (CT10) free 
parameters; MSTW08 [30J parametrizes also u v , d v , s^ 1 and the gluon, and then the two combinations 
u ± d with a total of 28 parameters, and so forth. 

Given a parametrization of PDFs, the problem is reduced to that of determining best fit values 
and uncertainty ranges for the parameters. In a Hessian approach, this is done by minimizing a figure 
of merit such as 



where the sum runs over all data points, di are experimental data with experimental covariance 
matrix cov^- (including all correlated and uncorrelated statistical and systematic uncertainties), di(a) 
are theoretical predictions which are obtained by evolving the starting PDFs at any scale Q 2 using the 
evolution equations Eq. (|22p , and then folding the result with known partonic cross sections according 
to the factorization theorems Eqs. (|2|6)10p . and a denotes the full set of parameters on which the 
PDFs at scale Qq depend, which we may view as a vector in parameter space (which is 26-dimensional 
for CT10, and so on). The x 2 thus is a function of the a through the predictions d~i(a). 

Note that the x 2 Eq. (|28p is normalized to the number of data points: this is conventionally done 
in order to allow for approximate comparisons of fit quality between fits with different numbers of data 
points; in practice, this is likely to be close to the x 2 P er degree of freedom because typical datasets 
include thousands of data, while the total number of parameters needed to describe accurately all 
PDFs with functional forms like Eq. (|26p . though of course unknown, is likely to be rather lower than 
a hundred. For the sake of future discussions it is convenient to also introduce an unnormalized 



It is important to note that there are subtleties in the definition of the x 2 5 which may make the 
comparison of x 2 values from different groups only qualitatively significant, because slightly different 
definitions are used. The main subtlety is related to the inclusion of normalization uncertainties, 
which cannot be simply introduced in the covariance matrix, as this would bias the fit [31j : a full un- 
biased solution [32] requires an iterative construction of the covariance matrix, but other approximate 
solutions are also adopted. 

Once the x 2 1S defined, for given data x 2 1S a function of the PDF parameters through the predic- 
tions d{ which in turn depend on the PDFs. Hence, the best fit set of parameters can be identified 
with the absolute minimum of the x 2 m parameter space. Furthermore, the variance of any observable 




(28) 



X 2 = N AatX 2 . 



(29) 
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X which depends on parameters a (such as a physical cross section, or indeed the PDFs themselves), 
if we assume linear error propagation X(a) « Xq + aidiX(z), is given by 

a\ = (JijdiXdjX. (30) 

Here Oij is the covariance matrix of the parameters which, in turn, assuming that the x 2 is a quadratic 
function of the parameters in the vicinity of the minimum, is given by (see e.g. |33t I34j) 

Oij = didjX 2 1 min (31) 

i.e. it is the (Hessian) matrix of second derivatives of the unnormalized x 2 Eq. (|29p . evaluated at its 
minimum. 

The Hessian method for the determination of uncertainties thus in particular implies that the one- 
a (i.e. 68% confidence level) for the parameters themselves is the ellipsoid in parameter space which 
is fixed by the condition x 2 = Xmin + 1 • As we will discuss in Sect. 15.11 in practice this argument 
may have to be modified in realistic cases, in order to account for various effects (such as incorrect 
estimation of the covariance matrix of the data) . 

However, for the time being let us stick to the textbook argument, and make a couple of observa- 
tions on it. The first observation is that we are always free to adjust the parametrization in such a 
way that all eigenvalues of the Hessian matrix Cij are equal to one, by simply diagonalizing the matrix 
and rescaling the eigenvectors by the eigenvalues, i.e. by looking for new parameters a'j{a{) such that 

AW 

a l3 ( ai - af*)( aj - af a ) = ^ ( a * > ( 32 ) 

i=i 

which immediately implies that 

a\ = |V'X| 2 , (33) 

where the gradient is computed with respect to a'. Equation (j33|) has the immediate interesting 
consequence that the total contribution to the uncertainty due to two different sources, being the length 
of a vector, is simply found by adding the components i.e. the different uncertainties in quadrature 
(even when the two uncertainties are correlated). This has been emphasized recently in Ref. [35], 
where it is shown explicitly that, contrary to what one may naively think, the total uncertainty due 
to PDF parameters and some other parameter (such as the value of the strong coupling constant) is 
simply found adding the two uncertainties in quadrature. 

The second comment has to do with the fact that the one-u interval in parameter space corresponds 
to the contour AN^tX 2 = 1 about the minimum. This is identical to the statement that the Hessian 
Eq. (|3ip is the covariance matrix in parameter space. This simple fact is sometimes source of confusion 
because it seems to contradict the observation that the standard deviation of the (unnormalized) x 2 
distribution with A^ f degrees of freedom is in fact, sometimes (see e.g. |36j) it is incorrectly 

stated that one-cr contours correspond to Ax 2 ~ Ndof- However, the contradiction is only apparent: 
Ax 2 ~ -/Vdof sets a hypothesis-testing criterion [37], namely, it gives the size of fluctuations of Ax 2 
upon repetition of the experiment, and thus the range of x 2 values away from the mean (x 2 ) = A^j a t 
which are acceptable for a given theory (experiment). One the other hand Ax 2 = 1 provides a 
parameter- fitting criterion [37]: it gives the range of parameter values which are compatible at one 
sigma for a given experimental result (and theory). 

A simple example may help in understanding the distinction. Consider the case of a simple 
linear fit, in which one has a set of data which are expected to satisfy a linear law y = x + k, with 
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2.5 5 7.5 10 12.5 

Figure 6: Fit to data Gaussianly distributed about a linear law. The solid line is the true underlying 
law, the dashed line is the best-fit, and the dotted lines give the one-a (Ax 2 = 1) error band, which 
is by -jy— smaller than the error on each point. 



unknown intercept k that one wishes to determine by fitting to data (see Fig. [6]). Define the deviation 
Aj = di — di(xi) between the i-th data point di and the linear prediction d{ = Xi + k. If Aj are 
gaussianly distributed with standard deviation a about their true values, then clearly the average 
square deviation a\ = (A 2 ) = -/V^cr 2 . This is the "hypothesis testing" fluctuation range of the x 2 - 
However, the best-fit intercept k is just the average deviation k = (A), and the square uncertainty 

2 

on it is ct 2 = jff^'- so ^ ne "parameter fitting" range for k is indeed by a factor iVdat smaller than the 
expected total square fluctuation, because the best-fit value is determined as a mean, whose square 
fluctuation is by a factor A^ a t smaller than the fluctuation of the individual data. 

3.2 Monte Carlo uncertainties and the NNPDF approach 

A Monte Carlo approach differs from the Hessian approach in the way the uncertainty on the observable 
is determined in terms of the uncertainties in parameter space: the distinction Hessian vs. Monte 
Carlo thus has to do only with the way uncertainties are propagated from parameters to observables. 
However, the Monte Carlo way of propagating uncertainties is especially convenient when used together 
with a parametrization whose functional form is less manageable, for instance because the number of 
parameters is particularly large, or because the functional form is less simple than that of Eq. (126|) . 
or, more, in general, whenever linear error propagation and the quadratic approximation to the x 2 
in parameter space are not advisable, for reasons of principle or of practice. Therefore, we will first 
discuss the distinction between Hessian and Monte Carlo per se, then turn a brief review of the way a 
Monte Carlo approach has been used by the NNPDF group together with a choice of basic underlying 
functional form for PDFs which differs from that of Eq. (I26p , and finally address some issues related 
to the determination of the best fit PDFs when such functional forms are adopted. 
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Figure 7: Schematic representation of the construction of a Monte Carlo representation of parton 
distributions. 



3.2.1 Monte Carlo uncertainties 

Whereas in a Hessian approach parameters are assumed to be gaussianly distributed with covariance 
matrix given by the Hessian Eq. (|31j) . in a Monte Carlo approach the probability distribution 
in parameter space is given by assigning a Monte Carlo sample of replicas of the total parameter 
set. For example, if one uses the parametrization Eq. (|26|) one would then simply give a list of A^ ep 
replica copies of the vector of parameters s. Any observable X is then computed by repeating its 
determination Ar ep times, each time using a different parameter replica: the central value for X is the 
average of these results, the standard deviation is the variance, and in fact any moment of the 
probability distribution can be determined from the sample of A^ rep values of X thus obtained. 

Of course, this begs the question of how the distribution of parameter values, i.e. the distribution 
of parameter replicas is determined in the first place. In fact, this may look like a hopeless task: 
let's say that for each parameter the probability distribution in parameter space is given for each 
parameter as a histogram with three bins, one corresponding to the one-u region about the central 
value of the given parameter, and two for the two outer regions. Then, for Ap ar parameters the total 
number of bins is equal to 3 Npm > 3 • 10 9 with Ap ar = 20 parameters. Hence it looks like the total 
number of replicas must be hopelessly large in order to have sufficient statistics. This, however, is 
not necessarily the case, because it may well turn out that most of the bins are actually empty. To 
understand this, recall the Hessian computation of the uncertainty on X Eq. ([33]): it is clear that in 
order to determine the uncertainty on X, it is sufficient to know the distribution in parameter space 
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Figure 8: Scatter plot of central values and uncertainties of a Monte Carlo sample compared to the 
data for the dataset of the NNPDF1.2 parton fit [38] . 



along the direction of V'X. Hence, for this specific observable only one parameter is relevant. Even if 
one wants to determine the uncertainty on observables which probe any direction in parameter space, 
for any reasonably smooth function the number of bins which is needed in order to get an accurate 
representation of the probability distribution is likely to be much smaller than 10 9 . This then again 
raises the question of how one should sample the replica distribution in parameter space. 

The answer is found by noting that the maximum likelihood method gives a way of mapping the 
probability distribution in data space onto the probability distribution in parameter space. Namely, 
assume one has data di with covariance matrix covjj. Then, generate iV re p data replicas df , with 
a = 1, 2, ... , iV re p. For each value of a, i.e. for each replica, the whole set of data i = 1, 2, . . . iVdat 
is replicated, in such a way that if one takes the average over the N rep replicas d" of the n-th data 
point, then in the limit N rep — > oo this average tends to the original data value d n ; if one computes the 
variance of these N rep values in the same limit it tends to to the the standard deviation of the data; 
and if one computes the covariance of the n-th and m-th data replicas it tends to the covariance matrix 
element cov nm . Now, for each data replica, determine a best-fit parameter vector a a by minimizing 
the x 2 Eq. (|28p . but of the fit to the replica data df, rather than the original data. We end up with 
a Monte Carlo set of best-fit parameter vectors a a : again, the average over these a + 1, 2, ... , iV rep 
vectors a a gives us the best-fit parameters a a , and the covariance of the n-th and m-th components 
of the parameter vector gives us the covariance matrix a mn . In fact, it is easy to check (see e.g. |32j) 
that for gaussianly distributed data the results coincides with the Hessian covariance matrix Eq. (|3ip . 
We will see an explicit numerical check of this equivalence in Fig. [24] below. 

The procedure is summarized in Fig. [7) one starts with experimental data (denoted as Fj in the 
Figure), generates data replicas (denoted as Fi(l) . . . Fj(N)) and fits a set of PDFs to each data replica 
(denoted as qo (£)))• The PDFs can be parametrized in any desired way at some reference scale, and 
they are fitted to the data replicas in the way discussed in Sect. 13.11 namely by evolving them to the 
scale of the data, using them to compute observables, and minimizing the x 2 of the comparison to the 
data with respect to the parameters. 

But then, the problem of constructing an adequate sampling of parameter space has been reduced 
to that of constructing an adequate Monte Carlo representation of the original data: i.e. the space 
of parameters is sampled in a way which is determined by the distribution of the data ("importance 
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sampling"). Whether a given set of replicas provides an accurate enough representation of the data 
is then something that may be checked explicitly for a given sample, by comparing means, variances 
and covariances from the sample with the desired features of the data. For a typical set of data used 
in a parton fit the numbers of replicas required turn out to be surprisingly small: for instance, in 
Fig. [S]we show a scatter plot of the averages vs. central values and variances vs. standard deviations 
for the set of N^t = 3372 data points included in the NNPDF1.2 [38] parton fit, computed using 
iV re p = 10, 100, 1000 Monte Carlo Replicas. It is clear that the scatter plot deviates by just a few 
percent from a straight line already for N rep = 10 for central values, and for N rcp = 100; iV re p = 1000 
replicas turn out to be only necessary in order to get percent accuracy on correlation coefficients. 

It should finally be mentioned that within a Monte Carlo approach it is possible to sidestep the 
problem of choosing an adequate parametrization by using Bayesian inference [39j . Namely, one starts 
from some prior Monte Carlo representation of the probability distribution based on some initial 
subset of data, or even on assumptions. Then, the initial Monte Carlo set is updated by including 
the information contained in new data through Bayes' theorem. Without entering into details, it is 
clear that this can be done by changing the distribution of replicas: more or less copies are taken for 
those replicas which agree or respectively do not agree with the new data, in a way which is specified 
by Bayes' theorem. To the extent that results do not depend too much on the choice of prior, which 
is often the case if the information used through Bayes' theorem is sufficiently abundant, final results 
are then free of bias. Whereas the construction of a parton set fully based on this method has so far 
not been completed, preliminary results have been presented [40] on the inclusion of new data in an 
existing Monte Carlo fit using this methodology. 



3.2.2 Neural Network parametrization and cross-validation 

The Monte Carlo approach has been recently used for the determination of a PDF set in conjunction 
with the use of neural networks as a parton parametrization. Neural networks are just another 
functional form. In analogy to polynomial forms, they have the feature that any function (with 
suitable assumptions of continuity) may be fitted in the limit of infinite number of parameters; unlike 
polynomial forms they are nonlinear, and they are "unbiased" in that a finite-dimensional truncation 
of the neural network parametrization is adequate to fit a very wide class of functions (for instance, 
both periodic and non periodic) without the need to adjust the form of the parametrization to the 
desired problem. 

A very simple example of neural network is the function 

/(*) = W ci) ( 34 ) 

fl(3) "11 "12 

1 _|_ g l + e H l XUJ 11 1 + e 2 XU 21 



where 0$ and uinm are free parameters. This is a 1 — 2 — 1 neural network, parametrized by six free 
parameters: 1-2-1 refers to the way the neural network is constructed, by iterating recursively the 
response function g(x) = 1+e g_ /3;r on nodes arranged in layers which feed forward to the next layer, 
with the first (last) layer containing the input (output) variables. 

In Refs. [38j-[41j PDFs are parametrized using 2-5-3-1 neural networks, with 37 free parameters (the 
input has two variables because x and In a; are treated as two independent inputs, thereby increasing 
the redundancy of the parametrization) . The six light flavours and antiflavours are parametrized and 
the gluon are parametrized in this way, so that the total number of parameters is 37 x 7 = 259, thus 
rather larger than the typical numbers used when dealing with parametrizations of the form Eq. (126j) . 
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Figure 9: Left: 25 gluon replicas based on a neural network parametrization, together with their one-cr 
range, compared to a toy gluon distribution of the form Eq. (j26[) with g{x) = 1 and typical values 
of the parameters a and /3 (from Ref. |41]). Right: gluon replicas based on a parametrization on a 
basis of Chebyshev polynomials together with their one-cr range; subsequent plots correspond to an 
increasingly high penalty proportional to the length of the fitted curve, (from Ref. [24] ) . 



Such a large number of parameters clearly reduces considerably the risk of a parametrization bias, 
but it poses the problem that if the best fit is determined as the absolute minimum of the x 2 one may 
end up fitting data fluctuations, which is clearly not desirable. Even if these fluctuations average out 
when averaging over Monte Carlo replicas this would be a very inefficient way of proceeding. 

The advantage of a neural network parametrization can be understood from Fig. [9l where a gluon 
distribution determined using neural networks is compared to the simplest version of parametrization 
Eq. (|26p . and also to a very flexible parametrization based on orthogonal polynomials. The neural 
network gluon distribution shown in Fig. [9] corresponds to N rep = 25 replicas from the Monte Carlo 
set of Ref. [UJ, and it is displayed along with the average and one-cr contour computed from the set. 
On the same plot a parametrization of the form Eq. (|26p is also shown, with typical values of the 
parameters a and /3, and with g(x) = 1. It is compared to a set of Monte Carlo replicas of the gluon 
which were constructed in Ref. [23] by expanding the gluon distribution on a basis of 15 independent 
Chebyshev polynomials, while also imposing an increasing penalty p to fits with large arc-length 
(and thus more oscillations). The fits based on orthogonal polynomials display large uncontrolled 
oscillations which are only tamed by appropriately tuning the length penalty. The fits based on neural 
networks, despite having a number of free parameters which is more than double than those using 
orthogonal polynomials, do not display a similarly unstable behaviour, even though they do show 
considerable flexibility, and in fact the ensuing one-cr band, though accounting well with its width for 
the functional freedom is actually quite stable. 

The best fit is instead determined using a cross-validation method (see Fig. HOD . Namely, the data 
are randomly divided into a training and a validation sample. The x 2 1S computed both for the data 
in the training sample and those in the validation sample. Only the training x 2 is minimized, but the 
validation x 2 is also monitored as the minimization proceeds. The best fit is defined as the point at 
which the validation x 2 stops improving even though the training x 2 may keep improving: this is the 
point at which one is starting to fit the statistical noise of the training sample. In order to ensure lack 
of bias, the partitioning of the data is done randomly in a different way for each data replica. Also, 
in practice, in order to minimize the effect of random fluctuations in the data (or of the minimization 
algorithm) the stopping criterion must be imposed after a suitable averaging, such as for instance the 
moving average of values of the x 2 found in the last N s iterations of the minimization algorithm. 
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Figure 10: Determination of the best-fit by cross-validation: the x 2 of the fit to points in the training 
set (light blue) and the validation set (dark blue) are shown as a function of the number of iterations 
of the minimization algorithm (left plots), but only the x 2 for training data is minimized. The data 
are shown in the right plot, along with the best fit (red points without error band). The upper plots 
show the best fit at stopping point (optimal fit) and the x 2 up to this point, the lower plots show the 
X 2 up to and the best fit at an "over learning" point. 



4 From data to PDFs 

Once a parton parametrization and a methodology have been chosen, the determination of a PDF set 
relies on the choice of a set of physical observables. The problem is that, even after projecting the 
problem on a finite-dimensional parameter space, we must still determine seven independent PDFs, 
which means that we need seven linearly independent pieces of information at fixed scale for each 
value of x. For instance, a determination of deep-inelastic structure functions F\ and F3 for charged- 
current deep-inelastic scattering provides, according to Eq. (|15h four independent linear combinations 
of quark distributions (if can be distinguished), with two more linear combinations provided by 
neutral current structure functions. All individual light quark and antiquark flavours then can be 
determined by linear combination. This situation would be realistic at a neutrino factory with both 
neutrino and antineutrino beams and the possibility of identifying the charge of the final state lepton 
on an event-by-event basis \42\ H3] . 

Unfortunately, this theoretically and phenomenologically very clean option is at best far in the 
future, so at present the information on individual PDFs can only be achieved by combining in- 
formation from different processes into so-called "global" fits. The idea is that, even though each 



17 



electroproduction or hadroproduction observable depends on all PDFs through the factorization for- 
mulae Eqs. (|2ll0p . inclusion of specific processes or combination of processes may give a specific handle 
on individual PDFs or combinations of PDFs on which it depends most strongly (typically, through 
its leading order form). We will now review one at a time each of these individual handles on PDF, 
then briefly discuss how they are combined in modern more or less global fits. 

4.1 Isospin singlet and triplet 

Neutral current deep- inelastic (DIS) structure function data only provide a determination of the 
charge-conjugation even combination + q~i of quarks and antiquarks, for each quark flavor i. Specif- 
ically, photon DIS data only determine the fixed combination in which each flavor is weighted by the 
square of the electric charge, see Eq. (115p . However, one may separate off the isospin triplet and singlet 
components by considering DIS on both proton and deuteron targets, assuming that the deuterium 
structure function is simply the incoherent sum of the proton and neutron ones F$ = \{F P + ^2) ( U P 
to small nuclear corrections which can be accounted for through models, such as that of Ref. [44j ) . 
and then using isospin symmetry to relate the quark and antiquark distributions of the proton and 
neutron: 

U P(x,Q 2 ) = d n (x,Q 2 ); d p (x,Q 2 ) = u n (x,Q 2 ). (35) 

One then has 

F p (x, Q2) - Fi(x, Q 2 ) = i [{u p + vP) - [dP + dP)] [1 + 0(a s )} (36) 

so that the difference of proton and deuteron structure functions provides a leading-order handle on 
the isospin triplet combination 

T 3 (x, Q 2 ) = u(x, Q 2 ) + u(x, Q 2 ) - [d(x, Q 2 ) + d(x, Q 2 )] . (37) 

Note that even beyond leading order F% — F2 only depends on T3, which can thus be determined 
without further assumptions: a theoretically very clean, though necessarily not especially accurate 
determination |45j . 

4.2 Light quarks and antiquarks 

Modern DIS data are available over a wide range of values of Q 2 , extending well into the region where 
the CC contributions are sizable: in fact HERA-I data are available both for CC and NC scattering, 
both with electron and positron beams. Unfortunately, collider data only provide a fixed combination 
Eq. (jlip of the structure functions F\ and F%, because for given x and Q 2 Eq. (jlip implies that y can be 
varied only by changing the center-of-mass energy of the hadron-lepton collision. Hence, HERA data 
only provide three independent combinations of structure functions and thus of parton distributions 
(NC and CC with positively or negatively charged leptons). However, a fourth combination is provided 
because the Q 2 dependence of the 7* and Z contributions to NC scattering is different (see Eq. (|17p. It 
follows that the very precise HERA data can determine four independent linear combinations of PDFs, 
which can be chosen as the two lightest flavours and antiflavours, with strangeness then determined 
by assumption. 

Even without a neutrino factory, data on neutrino deep-inelastic scattering are available, but 
typically on approximately isoscalar nuclear targets. Because the energy of the neutrino beam typically 
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Figure 11: Impact of Drell-Yan data on the NNPDF2.0 [UJ parton fit. Left: the total valence 
distribution; right: the light antiquark asymmetry. 



has a (more or less broad) spectrum, the value of y Eq. (|14p is not fixed, and the contributions of F\ 
and F 3 to the cross section can be disentangled. On an isoscalar target at leading order 

F% = x(u + u + d + d + 2s + 2c) + 0{a s ), Ff = x(u + u + d + d + 2s + 2c) + 0(a s ), 

F% = u-u + d- d+2s -2c + 0(a s ), F 3 P = u - u + d - d - 2s + 2c + 0(a s ), (38) 

so neutrino data provide an accurate handle on the total valence component 

n s 

V(x, Q 2 ) = Q 2 ) - qi(x, Q 2 )). (39) 

i= 

A more direct determination of the light flavour decomposition can be obtained by exploiting the 
fact that the Drell-Yan cross section probes various parton combinations, which can be selected by 
looking at different final states. In particular one can notice [46J that for neutral-current Drell-Yan if 
both data on proton and neutron (or deuteron) targets are available, using isospin Eq. (|35j) one gets 
at leading order 

a pn iyPdP + -d P U P 

HZ ~ \uvuv + \dPdv + ° {as) + h6aVier qUarkS ' (40) 

where we have omitted the dependence on the kinematic variables, which at leading order is as 
in Eq. (|18p . As discussed there, if the rapidity distribution is measured, the leading order partonic 
kinematic is completely fixed: for given y and Q 2 only partons with x\, X2 given by Eq. ([S]) contribute. 
Here "heavier quarks" denote strange and heavier flavours, which give a smaller contribution at least 
in the region of x > 0.1 in which most of the contribution to the sum rule integrals Eq. (|25l25j) is 
concentrated. 

In particular, because of the sum rule Eq. (I24jl . in the the region which gives the dominant contri- 
bution to the integral (the "valence" region x > 0.1) the up distribution is roughly twice as large as 
the down distribution (assuming u ~ d) so the first term in both the numerator and the denominator 
of Eq. (|40p gives the dominant contribution, and the ratio reduces to ~ Hence this particular 
combination of cross sections provides a sensitive probe of the u/d ratio: indeed, it has been used to 
provide first evidence that this ratio, though of order one, deviates from unity |47l I48j . 
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In the charged current case, one may exploit the fact that using charge-conjugation symmetry to 
relate the p and p PDFs 

= i (4i) 



at leading order one gets 

a% + _ vP(x 1 )dP{x 2 ) + dP( Xl )uP(x 2 ) 



<?w- dP(x 1 ) U P(x 2 ) + uP( Xl )dP(x 2 ] 



+ 0(a s ) + Cabibbo suppressed + heavy quarks (42) 



where heavy quarks denotes charm and heavier flavours. In writing Eq. (|42p we have assumed that 
cross sections are differential in rapidity. If the kinematics is chosen in such such a way that X{ are 
in the "valence" region, in which quark distributions are sizably larger than antiquark ones, the ratio 
Eq. (j4"2j) is mostly sensitive to the light quark ratio u/d [HJJ[50] and indeed it has been used to provide 
the first accurate determinations of it [51] , 

The sizable impact of Drell-Yan data on a PDF fit is demonstrated in Fig. [TTl where we compare 
the value and uncertainty of PDF combinations which are sensitive to the light flavour decomposition 
before and after inclusion of Drell-Yan data in a DIS fit, namely the total valence Eq. (j39|) and the 
light sea asymmetry 

A s (x,Q 2 ) = d(x,Q 2 ) -u(x,Q 2 ). (43) 

The DIS fit includes both the fixed-target proton and neutron data (which thus determine well the 
isotriplet component and give a handle on the singlet-triplet separation), the precise HERA data 
(which give a handle on each individual light flavour and antiflavour) , and several neutrino data 
(which give a handle on the total valence component). The Drell-Yan data included contain both 
proton and deuteron fixed target 7* production data, and W and Z production. It is apparent that 
the accuracy on the valence, which is already quite good in the DIS-only fit, is reduced by a large 
factor by the inclusion of Drell-Yan data, and the effect is even more impressive on the light antiquark 
asymmetry which, despite the accuracy of the HERA data, is only determined with large uncertainties 
by DIS data. 
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Figure 13: The strange distributions s + = s + s (left) and s = s — s (right) determined in a fit to 
DIS including dimuon data (NNPDF1.2 [38]) and not including them (NNPDFI.I [52]). A fit without 
dimuon data where strangeness is fixed by assumption ((NNPDF1.0 [53]) is also shown. 



The strong impact of W and Z production data can be seen quantitatively by computing the 
correlation coefficient between the W and Z cross section and individual parton distributions, which 
can be computed both in a Hessian approach using standard error propagation, or in a Monte Carlo 
approach from the covariance of the cross section and the parton distribution over the Monte Carlo 
sample. Results obtained in the Hessian approach using CTEQ6.6 [27] are shown in Fig|12t correlations 
are quite large, even though results shown here are obtained using the total cross section, which is a 
much less sensitive probe of PDFs than the rapidity distributions discussed above. 

4.3 Strangeness 

The determination of strangeness is nontrivial because, of course, it has the same electroweak couplings 
as the down distribution, while it is typically smaller than it (except at small x where all PDFs are 
the same size, as discussed in Sect. 12.2]) . The only way of determining it accurately from deep-inelastic 
scattering data is to include semi-inclusive information. A simple way of doing this is to use data for 
neutrino deep-inelastic charm production (known as dimuon production, because charm is tagged by 
the muon from its decay together with the muon due to the charged current neutrino interaction). At 
leading order the structure functions are then just 

F^ c (x,Q 2 ) = xF^ c (x,Q 2 ) = 2x {\V cd \ 2 d(x) + \V CS \ 2 s(x) + \V cb \ 2 b(x)) + 0(a 2 ), 
F^ c {x,Q 2 ) = -xF^' c (x,Q 2 ) = 2x {\V cd \ 2 d(x) + \V CS \ 2 s(x) + \V cb \ 2 b(x)) + 0(a 2 ), (44) 

so up to CKM suppressed terms they measure strangeness directly. 

In Fig. [13] the behaviour upon inclusion of dimuon data of a fit to a set of DIS data which includes 
both neutrino and HERA data is shown: it is clear that before inclusion of the dimuon data the fit 
(NNPDFI.I [52]) cannot determine either of the two strange combinations 

s ± (x,Q 2 ) = s + (x,Q 2 )±s-(x,Q 2 ), (45) 

but after their inclusion it determines both, though with limited accuracy due to the limited accuracy 
and kinematic coverage of the available dimuon data. In this plot, we also show the result one obtains 
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for strangeness if one simply assumes it to be proportional to the light quark sea, i.e. if by assumption 
one sets s~ = and s + = \{u + d). This is often done in PDF determinations based on DIS data 
only: the result is then misleadingly accurate. This comparison should thus be taken as a warning 
that, when using PDF sets in which some PDFs are fixed by assumption, some uncertainties may be 
significantly underestimated. 

Of course the Drell-Yan data discussed above also constrain strangeness. Specifically, the cross 
section ratio Eq. (j42]) receives a contribution from strange and charm quarks which, up to CKM matrix 
elements, is identical to the contribution from down and up quarks respectively. Well above charm 
threshold this contribution is sizable, so comparing Drell-Yan data above and below charm threshold 
potentially leads to a rather accurate determination of strangeness. Indeed, in Fig. Q31 we show the 
impact of including Drell-Yan data in a fit with DIS data only (same pair of fits already shown in 
Fig. fTTj) . The DIS dataset contains dimuon data, and it is similar to the dataset on which the fit of 
Fig. [13] is based, from which it mostly differs because of improvements in the HERA data and in fit 
methodology; however, the Drell-Yan data have a visible impact on the total strangeness s + , and lead 
to a very striking improvement in the determination of the strangeness asymmetry s~. 

4.4 Gluons 

The determination of the gluon distribution is nontrivial because the gluon does not couple to elec- 
troweak final states. It does, however, mix at leading order through perturbative evolution: so, even 
using parton-model (i.e. 0(a®)) expressions for cross sections and structure functions, the gluon does 
determine their scale dependence. Indeed 

jFl(N,Q 2 ) = [ lqq (N)Fi + 2n nqg (N)g(N,Q 2 )] + 0{a% (46) 

where by F2(N,Q 2 ) we denote the Mellin moments Eq. (|20p of the singlet component (defined as in 
Eq. (|23l) ) of the Fi structure function. 

It follows that the gluon is mostly determined by scaling violations, or by its coupling to strongly- 
interacting final-states, i.e. jets. The main shortcoming of the determination from scaling violations is 
that, as already pointed out in Sect. 12.21 the gluon only couples strongly to other PDFs for sufficiently 
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Figure 15: Impact of jet data on the gluon distributions at small x (left) and large x (right) using the 
NNPDF2.0 [31] parton fit. 



small x: for instance, Fig. [5] shows clearly that for N > 2 the j qg term rapidly becomes negligible in 
comparison to the jgg term. On the other hand, the gluon distribution is expected to be quite small 
at large x, and, as also discussed in Sect. 12.21 to further shift towards smaller x as the scale increases. 
Hence, the large x gluon is likely to be small and affected by large uncertainties, which can only be 
reduced by looking at hadronic (jet) final states. 

Indeed, in Fig. [15] we show the effect of the inclusion of jet data in a PDF fit based on DIS data. 
At small x there is essentially no effect: scaling violations are sufficient to determine the gluon quite 
accurately. At large x, even though the determination of the gluon from scaling violations is reasonably 
accurate, its accuracy is still quite significantly improved by the inclusion of jet data. A feature of 
this plot which is worth noting is the beautiful consistency of these two determinations. This is an 
extremely strong consistency check for the perturbative QCD framework: the gluon determined from 
scaling violation and evolved up to the much higher jet scale is in perfect agreement with the jet data, 
and indeed the best accuracy is obtained combining the two determinations. 

4.5 Global fits 

It is clear that the wider the number of different processes, the greater the amount of information 
which is being used in the determination of PDFs. The price to pay for this, as we will discuss in 
Sect. [5] is that the determination of PDFs and especially their uncertainties from diverse and possibly 
inconsistent data might be nontrivial — at the very least, it is going to be computationally intensive. 
Current global fits use all the processed discussed so far in order to control as much as possible different 
aspects of PDFs. 

The dataset used in one such fit (NNPDF2.0 [41] ) is shown in Fig. [161 Different data in this 
set constrain different aspects of PDFs, along the lines of the preceding discussion, in a way which, 
referring to this specific dataset, can be summarized as follows: 

• information on the overall shape of quarks and gluons at medium x as well as on the isosinglet- 
isotriplet separation come from fixed-target DIS data on proton and deuterium targets (domi- 
nated by 7* exchange), denoted in the plot as NMC [S3], NMCpd [55], SLAC [56] and BCDMS [57] : 

• an accurate determination of the behaviour of the gluon and quark at small x (where it is 
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Figure 16: The data set used in a typical global PDF determination (NNPDF2.0 |41j). 



dominated by the singlet) and by individual light flavours at medium x (where CC and NC 
data play a role in separating individual flavours) is found from the very precise HERA CC and 
NC data denoted in the plot as HERAI-AV [58] . which were obtained by combining the ZEUS 
and HI data from the HERA-I run. More recent HERA-II ZEUS NC [59] and CC [60J data 
(ZEUS-H2) are also used. 

• information on the flavour separation at small x comes from Tevatron Drell-Yan data (in par- 
ticular the W asymmetry, as discussed above) denoted in the plot as CDFWASY [61J, CD- 
FZRAP [62], DOZRAP [63]; 

• the flavour separation at medium x is mostly controlled by the Tevatron Drell-Yan data on fixed 
proton and nucleus target, DYE605 |M] and DYE866 [651 E3 E7] in the figure. 

• the total valence component is constrained by the neutrino inclusive DIS data, denoted as 
CHORUS [68] in the plot; 

• strangeness is controlled by neutrino dimuon data (NTVDMV |69} I70| ) , as well as by the 
interplay of the W and Z production data with lower scale DIS and Drell-Yan data; 

• the large x gluon, already determined by DIS scaling violations, is further constrained by Teva- 
tron jet data (CDFR2KT [71], D0R2CON [72]). 

Other global fits may differ in some detail, such as the specific choice of experiments or the addition 
or subtraction of some set of data, but are mostly based on datasets constructed on the basis of a 
similar logic. Smaller datasets, typically a subset of the above, are also considered. 
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Figure 17: Decrease of the x 2 f° r various dataset entering a global fit plotted as a function of the 
increase in \ 2 of the global fit when moving away from the global minimum, (from Ref. |37j ) . 



Future improvements on some of these processes, in particular Drell-Yan (including W and Z) 
production and jet production will certainly come from the LHC, both because of the higher available 
center-of-mass energy (compare Fig. [3]), and because of the higher statistics which will be accumu- 
lated once higher or design luminosity are reached. Some other processes which are likely to become 
important at the LHC are prompt photon and heavy quark production, as well as Higgs production 
(if the Higgs is found and understood) , all of which are sensitive probes of the gluon distribution. We 
will briefly come back on these issues in Sect. 16.21 after discussing the current main difficulty in the 
understanding of PDFs, namely, the treatment of PDF uncertainties. 

5 PDF uncertainties 



The accurate determination of PDF uncertainties is clearly necessary if one wants to be able to obtain 
meaningful predictions from the factorized QCD expressions of Sect.[2j Because PDFs are determined 
by comparing QCD predictions to the data, as discussed in Sect. [3l any uncertainty in the theory 
used to obtain these predictions will propagate onto the PDFs themselves. Such uncertainties include 
genuine theoretical uncertainties, such as lack of knowledge of higher-order perturbative corrections: 
these, generally, do not have a simple statistical interpretation (and in particular they are generally 
not gaussian). They also include lack of knowledge of parameters in the theory, in particular the 
value of the strong coupling constant a s and the heavy quark masses m c and nib, which generally 
do follow gaussian statistics. The treatment of these uncertainties is in principle straightforward, in 
the sense that all one has to do is propagate them onto the PDFs — their effect on PDFs is no 
different from their effect on the calculation of a physical observable, and PDFs do not entail any 
new problem. For example, if it is agreed that higher order corrections on cross sections can be 
conventionally estimated by varying renormalization and factorization scales in a certain range, to 
be interpreted, say, as a 90% confidence level with flat distribution, the associate PDF uncertainty is 
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Figure 18: Determination of the tolerance for a global fit (from Ref. [76J). The distance on the y axis 
is given in units of Eq. (|29p of the global fit. The interval shown for each experiment corresponds 
to a 90% confidence level. The band shown, corresponding to Ax 2 = 100 is just wide enough to 
accommodate the ZEUS (upper variation) and CCFR2 (lower variation) experiments. 



simply found by repeating the PDF determination while performing this variation. We will refer to 
these as "theoretical uncertainties", and come back to them in Sect. 16.11 

On top of these, however, PDFs are affected by statistical uncertainties which are related to the 
way the information contained in the data is propagated onto a PDF determination following the 
process summarized in Fig. [7J The determination of these uncertainties is highly nontrivial because, 
as discussed in Sect. O the desired final outcome of this process is the determination of a probability 
distribution in a space of functions: these uncertainties are supposed to behave as genuine statistical 
uncertainties, with a well-defined probability distribution, and it is not obvious how to make sure, and 
then verify, that this is the case. These will be referred to as "PDF uncertainties" for short. 

First attempts to determine PDF sets which include PDF uncertainties are only quite recent [73|, 
l74"l [75] ; they immediately met with the difficulty that as soon as wide enough datasets (such as those 
discussed in Sect. 14.5) ) are fitted, a standard statistical approach does not seem to be adequate [761177]. 
Furthermore, results obtained for relevant LHC processes such as Higgs production using various 
different sets |78j do not always agree well with each other. On both of these issues there has been 
considerable progress over the last several years. On the one hand, the understanding of statistical 
issues related to PDF uncertainties has advanced considerably, and it will be reviewed in the remainder 
of this section. On the other hand, existing PDF determination show a distinct convergence as various 
phenomenological and theoretical issues are addressed and understood, as we will see in Sect. 16.21 

5.1 Tolerance 

Available fits to wide enough sets of data based on the Hessian approach and the "standard" parton 
parametrization Eq. (I26D discussed in Sect. ItTTl run into the difficulty that the best-fit is not simulta- 
neously a best-fit for individual datasets. Specifically, one can test for the possibility that the x 2 °f 
the fit to individual datasets entering the global fit may be improved by moving away from the global 
minimum by introducing Lagrange multipliers to select which dataset to minimize [37]. Results, shown 
in Fig. \T7\ are disquieting: not only the minima of individual experiments do not coincide with the 
global minimum but some of these minima seem to deviate much more than one might expect on the 
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Figure 19: Dynamical tolerance for the MSTW08 PDF fit. Left: the tolerance interval for the 13 
eigenvector; the inner and outer uncertainty bands correspond for each experiment to the 68%c.l. and 
90%c.l. ranges. Right: the tolerance interval for each eigenvector, along with the experiments which 
determine it; the T 2 = 50 and T 2 = 100 previously [76l[77] adopted are also shown (from Ref. [30J ) . 



basis of statistical fluctuations, and there even seem to be runaway directions for some experiments. 

This suggests that likelihood contours (for example one-cr) for the global fit can only be determined 
while simultaneously testing for the degree of agreement of individual experiments with it. The way 
this is done is by introducing the concept of "tolerance", defined as follows [76J. First, the Hessian 
matrix is diagonalized. Next, one moves the value of each eigenvector away from the minimum 
of the global fit in either direction, and one computes the x 2 °f each experiment. Then, for each 
experiment one determines both the position of the minimum of the \ 2 and the one-cr interval about 
it (corresponding to the Ax 2 = 1 variation about the minimum), or equivalently the 90% confidence 
level (obtained by rescaling the former interval by the factor Cgo = 1.64485... [S]). Finally, one 
takes the envelope of the error bands for individual experiments at the desired confidence level (c.l., 
henceforth). For example, at the 90% c.l. one determines the range of variation in parameter space 
along this eigenvector about the minimum such that the 90% c.l. interval of each experiment overlaps 
with this range. This gives a tolerance interval for the given eigenvector. The width of this interval can 
be measured in units of the variation of the x 2 °f the global fit. This defines a tolerance: T 2 = Ax 2 
is the width of the envelope (see Fig. [TBI) . 

The 90% c.l. is finally taken to be Ax 2 = T 2 instead of Ax 2 = Cg (equivalently, the one a contour 
is Ax 2 = T 2 /cqq). The logic behind this is that PDFs should allow one to obtain predictions for new 
processes at the desired confidence level: for instance, the actual result for a new measurement should 
have a 68% chance of actually falling into the predicted one-cr band. If new experiments behave as the 
experiments which are already included in the fit do on average, then this will happen for the one-cr 
band defined in this way, while if the one-cr band were defined on the basis of standard statistics the 
chances of the measurements falling outside the band would be much higher. It should be stressed 
that therefore a tolerance analysis is required in order for a fit based on this methodology to be reliable 
(unless the dataset adopted is very small and/or consistent). 

In Ref. [76] it was found that in practice T 2 = 100 worked for all eigenvalues and experiments at 
90% c.l. for the dataset and fit considered there, corresponding to A% 2 = T 2 /cg » 37 at one sigma. 
A similar analysis in Ref. [77] found instead T 2 = 50. Taken at face value, this would imply that all 
experimental uncertainties have been underestimated by a factor of about T/cqq « 6 (for T 2 = 100) 
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Figure 20: One-cr interval computed from a distributions of 100 replicas of the gluon distribution. 

or T/cqo ~ 4 (for T 2 = 50) . While some uncertainty underestimation is possible, such a large factor 
is at best puzzling, and thus its origin deserves further investigation. 

The concept of tolerance was subsequently refined, by suggesting that instead of a global tolerance 
value for all eigenvalues, a different tolerance value, determined as above, be adopted along each 
eigenvector direction. This is called "dynamical" tolerance [30]. Proceeding in this way, one finds a 
tolerance T < 6.5, with most values being in the range 2 < T < 5, so the large tolerance problem 
is somewhat mitigated. Also, in this approach it is possible to trace which individual experiment 
is controlling the tolerance range for each eigenvalue. This, together with the expression of the 
eigenvector in terms of the original parameters, provides insight on the relation between data and 
PDF parameters and their mutual consistency. Such an analysis is displayed in Fig. [TUl where both 
the tolerance analysis for one specific eigenvector, and then the experiments and corresponding band 
which control the tolerance interval for each eigenvector. A related but different refinement was 
suggested in Ref. [28], based on the idea of determining the band such that each experiment agrees 
with the global fit to say 90% c.l. by means of a sharply rising penalty term in the global x 2 based 
on the x 2 °f each experiment. 

It is interesting to contrast this treatment of uncertainties in the Hessian approach with "stan- 
dard" parametrization with the Monte Carlo approach together with neural network parametrization 
discussed in Sect. 13.2.21 In that approach, uncertainty bands corresponding to any given confidence 
level can be computed directly from the Monte Carlo sample: the one-cr interval is just the standard 
deviation of the sample, and one may even check whether it indeed corresponds to the central 68% of 
the distributions of PDF replicas. This is shown in Fig. [20] for the gluon distribution (from Ref. |38j): 
in this case (and in fact [41] in most cases) the one-cr and 68% c.l. intervals coincide. In a Monte 
Carlo approach, whether or not the fits behave consistently when comparing fit results to new data, 
and then including these new data into the fit, can be verified a posteriori by performing statistical 
tests on the fit results. These tests were performed successfully for the fits of Refs. [38\ [53"1 141] . 

The question of the appropriate range in global x 2 which corresponds to one sigma is thus side- 
stepped. In principle, it can be answered a posteriori: in a Monte Carlo approach, the x 2 °f the mean 
is a property of the Monte Carlo sample, so one could compute the one-cr interval from the sample 
itself. In practice, it is nontrivial to do this accurately because, as explained in Sect. I3T1 the x 2 has 
fluctuations of order N^ at , and in a Monte Carlo approach these fluctuations take place replica by 



28 




2 4 6 

cr 



Figure 21: Distribution of discrepancies between each experiment entering a global fit, and the global 
best fit. The solid (red) curve is the standard gaussian distribution, the dashed (green) curve is 
a gaussian distribution with a rescaled by a factor 1.88, the dotted (blue) curve is a Lorentzian 
distribution (from Ref. |79j ) 



replica, so one needs a very large sample to determine the x 2 accurately. 

However, the issues which may be responsible for the large tolerances can be addressed both in a 
Hessian and in a Monte Carlo approach as we will now discuss. 

5.2 Parametrization bias and data incompatibility 

The large tolerance values discussed in Sect. I5.ll are, by definition, a manifestation of the poor mutual 
compatibility of the experiments that go into the global fit. One possible explanation for this is that 
experiments are genuinely incompatible with each other within their stated uncertainties, i.e. that 
their published uncertainties are underestimated. We will refer to this possible explanation as "data 
incompatibility" . 

Another possible explanation is that the way uncertainties are propagated from experiments onto 
PDFs leads to underestimating the uncertainty in the latter. For example, assume that experiment A 
does not depend on some PDF parameter, and that one determines PDFs from this experiment, but 
instead of leaving the undetermined parameter free, one fixes it in some arbitrary way. If the ensuing 
PDF is then used to predict another experiment B which happens to depend on the undetermined 
parameter the likelihood of results being in agreement with the prediction will not depend on statistics, 
but rather in the arbitrary way the parameter has been fixed. We will refer to this as "parametrization 
bias". 

Of course other options are possible: for example, that the theory which is being used is not ade- 
quate. In the latter case, however, one would have to find a convincing argument why this theoretical 
inadequacy has not been seen elsewhere. 

Data incompatibility in the Hessian approach was recently studied in a quantitative way in Ref. [79], 
exploiting the observation [80] that once the x 2 has been written in the form of Eq. (]32p one can perform 
a further linear transformation of the parameters which preserves this form, while also diagonalizing 
the contribution to the x 2 from some specific subset of data. After this simultaneous diagonalization, 
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Figure 22: Tests of data compatibility by changing the order of inclusion of data in fits with different 
datasets, based on the NNPDF2.0 [H] PDF determination. Top: effect on the gluon distribution of 
the inclusion of jet data in a fit to DIS data only (left) or on a fit to DIS+Drell-Yan data (right). 
Bottom: effect on the sea asymmetry Eq. (I43D of the inclusion of Drell-Yan data in a fit to DIS data 
only (left) or on a fit to DIS+jet data (right). 



the x 2 is written as the sum of a contribution from the data in the given subset and the rest: the 
distance of the minima of these two contributions to the x 2 i n units of the corresponding standard 
deviation measures the compatibility of the given subset of data with the rest of the global dataset. 
The idea is then to study the distribution of such distances, in all cases in which the experiment does 
contribute significantly to the global minimum. If experimental uncertainties are correctly estimated, 
they should be gaussianly distributed. The results of this analysis, shown in Fig. [2TJ suggest that 
the distribution of discrepancies deviates significantly from a gaussian distribution, and that if it is 
fitted to a gaussian its uncertainty should be rescaled by about a factor 2. This suggests uncertainty 
underestimation by a similar factor, which corresponds to a value of the tolerance for 90% c.l. of order 
of T 2 ~ 10. 

This suggest that data incompatibility can explain only in part the need for large tolerance. Further 
evidence that data incompatibility is at most moderate can be obtained in a Monte Carlo approach, 
by comparing the effect of the subsequent inclusion of different datasets into a fit. Indeed, if some 
datasets were incompatible with others, then the effect of their inclusion in the global fit would change 
according to whether the global fit already includes the data with which they are incompatible or not. 
Assume for example that the gluon determined from jets is compatible with that found exploiting 
scaling violations in DIS data, but less compatible with that found from scaling violations in Drell- 
Yan: then, inclusion of jet data in a pure DIS fit would have a different effect than their inclusion in a 
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Figure 23: Parametrization bias. Left: the one-cr range for parameter z when parameter y is kept 
fixed away from the best-fit value corresponds to a range which is larger than Ax 2 = 1. Right: 
various gluons (dashed curves) based on a general parametrization which lead to the same or better 
fit quality as the best-fit gluon of Ref. [27], compared to the Ax 2 = 10 band about the latter gluon 
(from Ref. [EI]). 



fit which contains both DIS and Drell-Yan. When such tests are performed |41j no evidence for data 
incompatibility is found, as demonstrated in Fig. [22]) . 

Let us now turn to the possibility that parameterization bias may be responsible for the effect. 
The way this could happen in a Hessian approach was recently exemplified in Ref. [81] . Assume a 
relevant parameter on which PDFs depend is not fitted, but rather fixed by assumption at a value 
which is away from its best-fit. Then, clearly (see Fig. I23j) the one-cr range for the other parameters 
when this parameter is kept fixed corresponds to a variation of x 2 which is greater than the standard 
Ax 2 found when moving away from the minimum. A first estimate of the possible size of this effect 
was also provided in Ref. [81] by simply repeating the PDF fit of Ref. [27], but with a much more 
general parametrization, based on expanding the gluon on a basis of orthogonal polynomial, analogous 
to that of Ref. [23] shown in Fig. [9] With this more general parametrization, fits whose x 2 is similar 
to or better than the best-fit x 2 °f the more restrictive parametrization are found to span a band 
which corresponds roughly to the Ax 2 = 10 range for the restrictive parametrization. So this suggests 
a tolerance of at least T 2 = 10 just to account for the bias on the gluon shape imposed by the 
parametrization of Ref. [27] 

The issue of parton parametrization and bias thus deserves further investigation. First, one may 
ask whether the gaussian assumption is by itself a source of bias. This has been investigated in 
Ref. [23J, using a Monte Carlo approach together with a standard parton parametrization. Data 
replicas based on the HERA DIS data (Fj(n) of Fig. [7J have been generated either using a gaussian 
or a lognormal distribution (see Fig. I24j) . and used for a PDF fit based on the "standard" functional 
form Eq. (I26p , In each case, both the Monte Carlo and Hessian uncertainty are computed. Results 
are shown in Fig. [23J the lognormal and gaussian results are essentially indistinguishable, and so are 
the Hessian and Monte Carlo uncertainties. The choice of the probability distribution of the data 
does not seem to play any major role, as one might have expected from the central limit theorem: 
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Figure 24: Comparison of fits based on gaussianly or log-normally distributed data. From left to 
right: Comparison of the gaussian and lognormal distribution. Gluon determined from lognormal 
data. Gluon determined from gaussian data. In the latter two cases, the plot shows individual replicas 
(green) Hessian uncertainty band (black) and Monte Carlo uncertainty band (red) .(From Ref. |23j). 
Gluon determined from the same gaussian data, but using the neural network parametrization of 
Ref. [53] (from Ref. [82]) 



with so many data, everything looks gaussian. This also provide a nice visual demonstration of the 
equivalence of Hessian and Monte Carlo uncertainty computation in the Gaussian case. 

On the other hand, in the same figure we also show the gluon obtained in a fit to exactly the 
same data, but using the neural network functional form and associate cross-validation methodology 
of Ref. |53| . It is clear that the uncertainty is now much wider. This suggests that it is the form of 
parametrization which plays a dominant role, rather than the form of the probability distribution. 

The issue has been investigated further in the HERAPDF [29] PDF fits, where the standard 
Ax 2 = 1 PDF uncertainty based on a "standard" functional form Eq. (|26p has been supplemented by 
a further parametrization uncertainty, obtained by varying the assumed functional form (in particular, 
the large x behaviour, the number of terms in the polynomial Eq. (|27p . and the assumptions on 
strangeness which is not fitted). It is clear from Fig. [25] that this leads to a sizable enlargement of the 
PDF uncertainty band. 

The Monte Carlo approach together with neural network offers an interesting way of searching 
for the origin of uncertainties, in that different sources of uncertainty can be switched on and off one 
at a time. In particular, one may perform the following exercise. First, one freezes the generation 
of data replicas, and one takes each replica dataset equal to the central values of the data. Recall 
from Sect. [372721 that each replica is fitted to a different, randomly chosen subset of the data. Hence, 
all datasets Fj of Fig. [7] are now the same, and only the way they are partitioned in validation and 
training sets changes between replicas. Each PDF replica is thus obtained as the fit to a different 
partition of the central experimental data. The (square) fluctuation of the data are reduced by a 
factor two: instead of having replicas which fluctuate about experimental data which in turn fluctuate 
about their "true" values, one only has different subsets of central data fluctuating about their true 
values. 

In Tab. [2] we compare some indicators of fit quality and results for a fit obtained in this way to 
those of the corresponding standard fit (using the fit of Ref. [38], based on DIS data). In particular, we 
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Figure 25: Comparison of statistical and parametrization uncertainties in the HERAPDF fit (model 
uncertainties denote what we call theoretical uncertainties), from Ref. [2"9"j . 





replicas 


central value 


fixed partition 


x 2 


1.32 


1.32 


~1.3 


(X Z )rcp 


2.79 ±0.24 


1.65 ±0.20 


~ 1.6 ±0.2 


(o")dat 


0.039 


0.035 


~0.03 



Table 2: Values of the \ 2 f° r the best fit (first row), the average and standard deviation of the \ 2 
of individual replicas (second row), and the percentage uncertainty of the prediction averaged over 
all data points (third row) for the PDF determination of Ref. [38] (first column); the same but with 
all PDF replicas fitted to different partitions of the experimental central values (second column); the 
same but with all data replicas fitted to the same partition of the experimental central values (in the 
latter case the process has been repeated with 5 different choice of fixed partition and averaged) 

compare the \ 2 °f the best fit in either case: this is unchanged. However, the average \ 2 °f each replica 
fit is smaller by a factor two. This is as it should be: in both cases the best-fit is reproducing the same 
central best-fit value, but the fluctuation of replicas about it are now suppressed by a factor two, and 
thus the average \ 2 P er replica is reduced by approximately the same factor. The surprising result 
however is found when one computes the average percentage uncertainty in the prediction obtained in 
either case. This is determined as the percentage uncertainty of the prediction obtained from the final 
replica PDF set, for all datapoints included in the fit, averaged over datapoints. One might expect 
that having halved the fluctuations, the average uncertainty should be reduced by a factor v2; in 
actual fact, it is reduced by a much smaller amount. 

The origin of this state of affairs can be understood by performing an even more extreme text: 
one simply produces 100 replicas fitted to exactly the same partition of the central data. In this case, 
all F[ contain the same data, partitioned in the same way into training and validation sets. Naively 
one may think that this may lead to simply repeating 100 times the same fit. This is not necessarily 
the case because each replica is determined by initializing the neural networks at random, and then 
minimizing by means of an (equally random) genetic algorithm. Hence one starts each time from 
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Figure 26: The \ 2 f° r the training dataset and total dataset for two different replicas fitted to exactly 
the same training subset of the central values of the data of Ref. |38| . shown as a function of the 
number of iterations of the minimization algorithm. 



a different point in the very wide parameter space, and then the minimum is approached along a 
different path. Indeed, in Fig [2)] we show the x 2 profiles along the minimization, as a function of the 
number of iterations of the minimization algorithm, for two individual replicas: it is clear that even 
though the final \ 2 values are quite similar, the number of iterations and profiles that take there are 
quite different, thereby showing that the minimum is approached along different paths. 

In this case, in order to make sure that results do not depend on the particular partition that has 
been picked in the first place, the whole procedure is repeated five times with five different choice of 
starting partition and results are then averaged. Results are shown in Tab. [2] (they should be taken as 
indicative, because one should use rather more than five fixed partitions for accurate results). These 
results are quite surprizing. The \ 2 °f the best fit and the average over replicas are unchanged, and 
this is to be expected: it shows that indeed the five replicas chosen are not special. However, very 
surprizingly, the average uncertainty, which one might expect to be tiny, is more than 50% of that of 
the original fit. This is also seen by comparing results for PDFs (see Fig. [271 the uncertainty band is 
smaller, but of the same order of magnitude as that of the full fit. The inevitable conclusion is that a 
large fraction of the uncertainty band, probably more than half, does not depend on the fluctuations in 
the data. Rather, it is a consequence of the fact that there is an infinity of functions that provide fits of 
comparable quantity to the data. Different minimization profiles such as those shown in Fig. [26] land 
on somewhat different minima; the uncertainty of this fit is then a measure of the spread of this space 
of minima. Once understood that a sizable fraction is simply due to this "functional" uncertainty, it 
is clear why it is more difficult to capture with a fixed parametrization, which then requires a suitable 
tolerance in order to mimic it. 

6 Recent developments 

The state of the art in PDF determination is moving very fast and thus any attempt to review it would 
necessarily become obsolete quite rapidly: a recent review of the status of the field as of November 
2010 is in Ref. [83]. Here, in an attempt to discuss issues of somewhat less fleeting value, we will first 
briefly review theoretical uncertainties, which are the current frontier of PDF determination, then 
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Figure 27: Comparison of PDFs from the Monte Carlo set of Ref. [38] (NNPDF1.2, blue band) to 
PDFs determined using the same data and procedure, but fitting all replicas to the central data (green 
band). The PDFs from Ref. [77] (MRST2001E, yellow band) and Ref. [27J (CTEQ6.6, red band) are 
also shown for comparison. The PDFs shown are the gluon (top left), total valence Eq. (I39j) (top 
right), triplet Eq. (f37|) (bottom left) and total strangeness s + Eq. (|4"5j) (bottom right). 



summarize what progress has been made and what remains to be done in the determination of PDFs 
for the LHC in the years to come. 

6.1 Theoretical uncertainties 

As already mentioned, the PDF uncertainties discussed in Sect. [5] are the result of propagating into 
the space of PDFs the uncertainty on the data on which the PDF determination is based. Most 
of the effort has gone so far in their determination and understanding because they are likely to 
be at present the dominant uncertainty. However, it has been recently realized that in many cases 
uncertainties related to the theory used to extract PDFs from the data may be larger than one may 
think. These, as already mentioned, include both uncertainties in the theory itself (such as higher 
order corrections) but also uncertainties in the knowledge of the free parameters on the theory. 

The most obvious source of theoretical uncertainty is the value of the strong coupling a s . The 
PDF which depends most strongly on it is the gluon distribution, which, as discussed in Sect. 14.41 is 
largely determined by scaling violations: the rather strong dependence of the gluon on the value of a s 
is shown in Fig. [28] for various PDF sets. Note that even though, as discussed in Sect. 13.11 the total 
PDF+a s uncertainty can be obtained by determining these two uncertainties separately and adding 
results in quadrature [35], when determining the a s uncertainty, the value of a s in the factorization 
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Figure 28: Dependence on a s of the gluon distribution determined in the fits of Ref. [27] (CTEQ6.6, 
left), Ref. [301184] (MSTW08, center) and Ref. [38] (NNPDF1.2, right) (from Ref. [85]). 



formula Eq. ([2]) must be varied both in the PDFs and in the partonic cross section a. This is especially 
important when dealing with processes, such as Higgs production in gluon-gluon fusion [S51 [S3] (or 
also top production) which depend on the gluon PDF and start at a high order in a s . For this purpose, 
PDF sets corresponding to different values of a s are necessary and have thus been produced by several 
group (using sets with PDFs given as a continuous function of a s is in principle also possible, but 
practically more cumbersome and less accurate). 

The only other free parameters in the QCD Lagrangian are the quark masses, i.e., in the per- 
turbative regime, the heavy quark masses. Dependence of PDFs on them are larger than one might 
naively expect, and has two different origins which we will now discuss in turn. The first, is simply 
the fact that even though all perturbative computation are done up to power-suppressed terms in Q 2 , 
terms of order and ^ may have a non-negligible impact on PDF fits. This was brought to general 
attention by the comparison [27J of two calculations of the W and Z production cross-sections based 

2 

on PDF sets which differ mostly because one does include corrections (CTEQ6.1 [86]) while the 
other (CTEQ6.6 [27]) does not. It turns out (see Fig. 129ft that these corrections change the result by 
an amount which is almost twice the (statistical) PDF uncertainty (as defined in Sect. [5]): note that 
all the bands in Fig. (]29[) are at 90%c.l., and the one-cr intervals would be accordingly smaller. 

In order to understand what is going on here, one must recall the way heavy quark PDFs are 
defined, and heavy quarks are treated in perturbative QCD computations. Usually, perturbative 
QCD computations are performed in a decoupling renormalization scheme |87j . in which heavy quarks 
decouple from perturbative Feynman diagrams for scales much lower than the quark mass. In such a 
scheme, below threshold the number of flavours in the evolution equation for the strong coupling and 
for parton distributions Eq. (|22ft is equal to the number of light flavours. So in particular whereas there 
may still exist a non-perturbative nonvanishing "intrinsic" heavy quark PDF below threshold [19], it 
will only start evolving and coupling to other PDFs through evolution equations above threshold: for 
all practical purposes, below charm threshold n/ = 3. However, for scales which are much larger that 
the heavy quark mass, there is no reason to treat the heavy quark on a different footing from any other 

2 

light quark: for scales Q 2 much larger than the charm mass, all terms of order can be neglected, 
and there are rif = 4 massless flavours. The problem arises for scales which are higher but not much 
higher than the heavy quark mass: then, the heavy quark can be produced and it does not decouple, 

2 

yet it is not necessarily a good approximation to treat it as massless, i.e. to neglect gf corrections. 

This situation is illustrated in Fig. [BUI where we show the neutral-current photon-induced DIS F| 
structure function (i.e. the contribution to F2 in which the virtual photon couples to a charm quark) 
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Figure 29: The total W and Z cross section at the LHC with yfs = 14 TeV computed using PDFs 
determined including charm mass corrections (red squares |27j ) and neglecting them (blue stars [86]). 
Other processes and PDF sets not discussed here are also shown (from Ref. [27]). All uncertainty 
bands shown are at 90% c.l.. 



computed in various approximations. The two curves labelled ZM-VFN (purple, highest curves) are 
curves in which charm is simply treated as another massless flavour. The charm PDF is assumed to 
vanish below threshold, and above threshold a (massless) charm component is generated by pertur- 
bative evolution — note that even if an intrinsic component did exist, it should be small, and only 
non-negligible for very large x |19j . The two NLO and NNLO ZM-VFN (zero mass- variable flavour 
number) curves correspond to the case in which anomalous dimensions are computed up to 0(a 2 ) 
and O(a^) respectively: the solution to evolution equations then includes all contributions of order 
a k j n n ^ a jj orc [ ers - m ag; w ith n > k — 1 at NLO and with n > k — 2 at NNLO. They are called 
"variable flavour number" curves because the number of active flavours is increased by one unit when 
each heavy quark threshold is crossed. For light quarks, [i 2 = Qq — the starting scale of perturbative 
evolution — and for the charm contribution /j, 2 = m 2 . Given that Qq ~ m c , at high scale there is no 
reason not to include the charm contribution in evolution equations along with the light contributions. 

However, the evolution equations neglect all quark mass effects. Indeed, the curve labelled FFN 
(fixed flavour number) 0(a 2 ) shows the result of the computation fixed order in a S: but now with the 
dependence on m c fully included. This is a fixed- flavour number result because charm is included as a 
massive quark in partonic cross sections, but only the lighter flavours contribute to evolution equations 
and the running of a s . Its limit for Q 2 — > oo, which coincides with the contributions to the VF-ZFN 
NNLO, but up to 0(a 2 ) only, is labelled as FFNO. The FFNO and FFN results are different, and their 
difference, which is sizable for Q 2 < 50 GeV 2 is a measure of the size of mass suppressed contributions 

(note however that all curves come together at threshold, where F2 C vanishes). In this region, In ^ is 
not large and indeed the ZM-VFN NNLO curve and the FFNO are quite close: the inclusion of higher 
order powers of In ^ in the ZM-VFN NNLO has little effect. On the other hand, for Q 2 > 100 GeV 2 
the FFNO and FFN curve become quite close — their difference are mass-suppressed contributions 
which here have little effect — but the ZM-VFN curve is much higher: here In ^ is large and its 
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Figure 30: The deep-inelastic charm structure function Fi c computed in various approximations (see 
text), plotted as a function of scale for fixed x = 10 _1 . The same conventional PDFs are used for all 
plots. 



all-order inclusion in the ZM-VFN result is important. In fact, the difference between the FFN curve 
and the ZM-VFN curve is much larger than the difference between the NLO and NNLO ZM-VFN 
curves: it is important to resum In ^ to all orders, but whether this resummation is done to NLO or 

Tfl c 

NNLO has a much smaller impact. 

So summarizing: for lower Q 2 < 50 GeV 2 it is important to include mass corrections and not 

important to resum In Q% to all orders, so the most accurate result is the FFN one. For Q 2 < 50 GeV 2 
the converse is true and the most accurate result is the ZM-VFN one. The question is whether the 
two can be combined. That this is possible in principle to any perturbative order is a consequence of a 
factorization theorem for massive quarks proven to all orders in Ref . [88J . A practical implementation 
was suggested and worked out up to NLO in Ref. [89J - the so-called ACOT method, which was 
used to produce the massive CTEQ6.6 result of Fig. [29l Other implementations were suggested in 
Refs. \90\ [9Tj (TR method) for deep-inelastic scattering, and in Ref. [92] for hadroproduction (FONLL 
method, recently generalized to DIS and worked out up to NNLO in Ref. [93]). These various methods, 
which thus combine both a massive fixed-order FFN computation and a massless all-order ZM-VFN 
resummation, are often referred to as "GM-VFN" (general mass, variable flavour number) methods. 

The FONLL curve is also shown in Fig. [30l in a version (called FONLL-B) which includes all terms 
which are contained in the ZM-VFN NLO and FFN 0(a 2 ) calculations. It is clear that the FONLL- 
B curve nicely interpolates between the FFN curve, more accurate at low scale, and the ZM-VFN 
one, more accurate at high scale. Without entering a discussion of these various prescriptions, which 
would be quite technical, it is important to understand that (unless an error is made) all GM-VFN 
prescriptions include all the terms included in the ZM-VFN and FFN calculations: they may differ 
first, in the orders at which either of the ZM-VFN and FFN terms are included, and furthermore, 
because even with the same terms included, subleading terms are generally different, and may in 
practice have a non-negligible impact. The impact of these subleading terms can only be reduced by 
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Figure 31: Comparison of FFN, ZM-VFN and GM-VFN computations of the deep-inelastic charm 
structure function F 2c : left, FFN 0{a s ) and ZM-VFN NLO; right, FFN 0(a s ) and ZM-VFN NNLO. 
In each case the various GM-VFN combine all terms in the FFN and ZM-VFN results, and only differ 
by subleading terms. Results are shown as a function of x, at a scale little higher than the threshold. 
The same conventional PDFs are used for all plots. 



pushing to higher orders both the ZM-VFN and FFN contributions which are included in the GM 
result. This is illustrated in Fig. [3Tj where the FFN, ZM-VFN and GM-VFN results are compared. 
The GM-VFN are shown in the version adopted by CTEQ6.6 Ref. [27] of the ACOT method of Ref. [89] 
(only available at NLO), in the version adopted by MSTW08 [30] of the TR method of Refs. [9Ql EE], 
and with the FONLL method of Ref. [93]. It is clear that the differences between the various GM 
schemes are sizable, and only start decreasing at NNLO-0(«g). 

Considerable progress has been made recently in the benchmarking of all these GM schemes (see 
Ref. [94j), and the use of a GM-VFN scheme, which is highly desirable, has become more widespread. 
However, a systematic estimate of the uncertainties related to the choice of specific heavy quark scheme 
is not available in existing PDF sets. 

So far we have only discussed one of the two ambiguities related to the treatment of heavy quarks. 
The other one has simply to do with the value of the heavy quark mass. This has a considerable impact 
because, as we have seen, apart from possible intrinsic contributions, heavy quark PDFs are obtained 
by assuming them to vanish at threshold, and then to be generated by perturbative evolution. But 
changing the mass changes the position of the threshold, and thus the amount of evolution. This has 
been very recently argued to have a potentially non- negligible impact on phenomenology [96]. First 
PDF sets with variable values of the heavy quark masses have been presented in Ref. [95]. Some 
representative results are shown in Fig [32) when the heavy quark masses are varied in a range which 
is representative of their uncertainty, the heavy PDFs vary by an amount (more than 5%) which is 
of the same order or larger than the PDF uncertainty. This variation then propagates onto all other 
PDFs and especially the gluon, both due to mixing upon perturbative evolution, and to sum rules. 

It seems clear that for accurate phenomenology the values of the heavy quark masses will have 
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Figure 32: Dependence of the heavy quark PDFs at a typical electroweak scale on the heavy quark 
mass: left, charm; right, bottom (from Ref. |95]). 

to be treated analogously to what is now being done for the strong coupling: PDF sets with varying 
heavy quark masses will have to be provided, and the value of the masses will have to be varied 
simultaneously in the partonic cross section and in the PDF sets. 

Finally, it should be recalled that there is at present no reliable estimate of the effect on PDFs 
of the uncertainty due to the truncation of the perturbative expansion. This is possibly a relatively 
smaller effect in comparison to those we discussed so far, but this too will have to be included in PDF 
sets, for example by providing sets which correspond to different values of the renormalization and 
factorization scales, whose variation is a way of estimating unknown higher order corrections. All in 
all, a proper treatment of theoretical uncertainty is the current frontier in PDF uncertainties. 

6.2 PDFs for the LHC 

An ideal PDF determination should include all of the following features: 

• It should be based on a dataset which is as wide as possible in order to ensure that all relevant 
experimental information is retained; in particular, all processes discussed in Sect. H] should be 
used. 

• It should be based on a sufficiently general and unbiased parton parametrization and/or it should 
include a careful estimate of the effect of varying the parton parametrization. 

• It should provide PDF uncertainty bands which have been either a priori (tolerance) or a poste- 
rior (Monte Carlo) checked to provide consistently-sized confidence levels for individual experi- 
ments. 

• It should include heavy quark mass effects through a GM-VFN scheme, and provide an estimate 
of the uncertainties due to subleading terms not included in the scheme which has been adopted. 

• It should be based on computations performed at the highest available perturbative order, namely 
NNLO for evolution equations and for most or all of the processes used for PDF determination. 

• It should provide PDFs for a variety of values of a s , reasonably thinly spaced and in a range 
which is representative of the uncertainty on this parameter. 
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Figure 33: Comparison of the NLO Higgs cross section at the LHC as a function of the Higgs mass com- 
puted using various PDF sets. Left: status 2004 (from Ref. [78]) using Alekhin2002 [97], CTEQ6 [76] 
and MRST2001E [77] PDFs (y/s = 14 TeV). Right: status 2010 (from Ref. [85]) using CTEQ6.6 [27], 
MSTW2008 [30] and NNPDF2.0 [53] PDFs (y/s = 7 TeV). 



• Ditto for the values of heavy quark masses. 

• It should include an estimate of uncertainties related to the truncation of the perturbative 
expansion. 

At present, there exists no PDF determination which has all these features simultaneously. Which 
of these features is most important for accurate results it will be possible to say with certainty only 
after such a PDF determination is constructed. However, the various features have been listed in the 
approximate (decreasing) likely order of importance, at least in the opinion of this author, based on 
the arguments presented so far. 

Therefore, existing sets satisfy only some of these requirements. Current PDF sets are provides 
through a standard interface, LHAPDF [98J, which is regularly updated for the inclusion of new sets 
and updates. Current PDF sets and their salient features include the following (listed in order of 
decreasing number of datasets included) 

• MSTW08 [301 El ES] Latest in the MRS-MRST-MSTW series of fits (Ref. [99] and subsequent 
papers). All data of Sect. 01 plus HERA DIS jets. Hessian approach with parametrization 
Eq. (]26p for seven independent PDFs (three lightest flavour and antiflavours and the gluon); 
28 free parameters, 8 of which are held fixed in the determination of uncertainties; dynamical 
tolerance uncertainties. GM-VFN scheme. Results available for various values of a s , nib and 
m c . NNLO perturbative order (whenever available). 

• CT10 [28] Latest in the CTEQ series of fits (Ref. [100] and subsequent papers, see also |101j ). 
All data of Sect. [4] Hessian approach with parametrization Eq. (126ft for six independent PDFs 
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(two lightest flavour and antiflavours, total strangeness and the gluon); 26 free parameters; 
dynamical tolerance uncertainties. GM-VFN scheme. Results available for various values of a s . 
NLO perturbative order. 

• NNPDF2.0 [H] Latest in the NNPDF series of fits (Ref. [45J and subsequent papers). All data 
of Sect. [H Monte Carlo approach with neural network parametrization for seven independent 
PDFs (the three lightest flavours and antiflavours, total strangeness and the gluon); 259 (37 x 7) 
free parameters; cross-validation uncertainties. ZM-VFN scheme. Results available for various 
values of a s . NLO perturbative order. 

• JR |102| Latest in the GR-GRV-GJR series of fits (Ref. [103] and subsequent papers). All data 
of Sect. [J] except W and Z production. Hessian approach with parametrization Eq. (|26p for 
five independent PDFs (two lightest flavour and antiflavours and the gluon); 20 free parameters; 
fixed tolerance uncertainties. Results available for single value of a s , but Hessian includes a s . 
FFN scheme. NNLO perturbative order. 

• ABKM [104] Latest in the Alekhin series of fits (Ref. |73j and subsequent papers). All DIS 
data of Sect. U] and fixed-target virtual photon Drell-Yan production. Hessian approach with 
parametrization Eq. (|26p for six independent PDFs (two lightest flavour and antiflavours, total 
strangeness and the gluon); 21 free parameters; no tolerance (A% 2 = 1). Results available for 
single value of a s , but Hessian includes a s , and also heavy quark masses. FFN scheme. NNLO 
perturbative order. 

• HERAPDF1.0 |29| HERA only DIS data Hessian approach with parametrization Eq. (I26p for 

five independent PDFs (two lightest flavour and antiflavours, and the gluon); 10 free parameters; 
no tolerance(Ax 2 = 1) but inclusion of parametrization uncertainties. GM-VFN scheme. Results 
available for various values of a s . NNLO perturbative order. 

Detailed comparisons of these PDF sets is the subject of ongoing benchmarking exercises, with 
the aim of arriving at the most accurate common determination of PDFs. The successfulness of the 
enterprise can be gauged by putting side by side (see Fig. [33]) predictions for Higgs production via 
gluon-gluon fusion at the LHC with different PDF sets obtained as the first PDF with uncertainties 
were published [78J, with those obtained more recently as first collisions were taking place at the 
LHC |85j (see Fig. [33]) , The improvement is quite clear, and convergence between PDF sets has 
further improved since. 

The status of the computation of the simplest LHC processes (which have been suggested as 
"standard candles", e.g. as a means to measure the machine luminosity |106j ) is summarized in the 
plots of Fig. [341 where predictions for W^, Z and top total cross sections obtained at NLO using the 
ABKM09 [104], CTEQ6.6 [27], HERAPDF1.0 [29], GJR08 [107], MSTW08 [30] and NNPDF2.0 [41] 
sets are plotted as a function of a s . The dotted lines show how each prediction can be extrapolated to 
different values of a s (the extrapolation is not available for the ABKM and GJR sets since these are 
only published for a single value of a s , though in principle the information on the a s dependence is 
contained in their covariance matrix). It is clear that first, once brought to a common value of a s the 
various sets are in fair agreement, and second, the agreement further improves when restricted to PDF 
sets that are based on more similar assumptions (such as common dataset, number of independent 
PDFs etc.). 

An interesting question that remains is how one can proceed when some disagreement remains, 
and there is no clear reason to favor one set over the other. In this case, Bayesian statistics provides an 



42 



NLO W* -> rv at the LHC (vs = 7 TeV) 



si 
c 

„ 6.4 - 



f 6.2 - 
+ 

m - 

S 5.8- 



5.6 — Vertical error 
Inner: PDF oi 
Outer: PDF+c 



68% C.L. PDF 
• MSTW08 

■ CTEQ6.6 

■ \ NNPDF2.0 

T HERAPDF1.0 
ABKM09 
GJR08 



14 0.116 0.118 0.12 



0.122 0.124 

4 



a s («l§ 



NLO W -» fv at the LHC (vs = 7 TeV) 



4.4 r- 
4.3 r 



T 
5 



3 

D 



Vertical error bars 
Inner: PDF only 
Outer: PDF*o s 



68% C.L. PDF 
• MSTW08 
■ CTEQ6.6 
A NNPDF2.0 
T HERAPDF1 .0 

ABKM09 

GJR08 



114 0.116 0.118 0.12 0.122 0.124 



a-(M*: 



t 

o 
£0 

°N 

e 



1.02p-r 

1 : 

0.98- 
0.96 r 
0.94 ^ 
0.92 '- _ 

0.9 ^ 
0.88 r 
0.86- 
0.84 ^ 



NLO Z u -> l*r at the LHC (.s = 7 TeV) 



NLO tt cross sections at the LHC (\s = 7 TeV) 



Vertical error bars 
Inner: PDF only 
Outer: PDFta 



68% C.L. PDF 
• MSTW08 
■ CTE06.6 

A NNPDF2.0 
T HERAPDF1.0 
ABKM09 
GJR08 



14 0.116 



0.122 0.124 

a s {M 2 ) 



^ 190 

a. 



170 
160 
150 
140 
130 

1?A 



Vertical error bars 
Inner: PDF only 
Outer: PDF+ct,. 



i8% C.L. PDF 
• MSTW08 
■ CTEQ6.6 
A NNPDF2.0 
T HERAPDF1.0 

ABKM09 

GJR08 



14 0.116 



0.122 0.124 



Figure 34: LHC "standard candles" computed using various PDF sets: total cross section for the 
production of W + (top left), W~ (top right), Z (bottom left) tt (bottom right) (from Ref. |lU5j ). 



answer [108J: in Bayesian terms, the probability distribution for PDFs is a distribution of true values, 
i.e. P(f) expresses the degree of belief that the true value is indeed /. Then, given two different, but 
a priori equally reliable determinations P\{f) and ^(Z) of the probability distribution, the combined 
probability is just P(f) = \{P\{f) + P^if)): with obvious generalizations if the determinations of the 
probability distribution are more than two or not all equally likely. In a Monte Carlo approach this 
is especially easy to implement: the combined probability distribution is obtained by simply taking 
a Monte Carlo sample in which half of the replicas come from either distribution. A 68%c.l. of 
the combined probability is then simply the region which contains the central 68% of all the given 
distributions, i.e. 68% of the combined replica set. 

In practice, a reasonable approximation to the Bayesian estimate may well consist of simply taking 
the envelope (i.e. the union) of the 68% intervals of the probability distributions which are being 
combined: this typically leads to a slight overestimate of the error band, because some of the replicas 
in the outer 32% of one of the distribution may fall within the central 68% of another distribution, 
but this in practice is often a small correction, hence the envelope prescription can be a simple and 
effective way of combining probability distributions. 

7 Conclusion 

The physics of parton distributions is now close to a beginning, rather than a conclusion: LHC 
data for many of the processes discussed in Sect. H] are being collected and will soon be published. 



43 



The availability of LHC data is likely to change significantly our perspective on the subject. The 
kinematic range of "old" processes such as Drell-Yan will be extended and their accuracy will improve. 
Processes which are at present not competitive will become important (such as perhaps prompt photon 
production). Entirely new processes will play a role, such as Higgs production. Hopefully, classes of 
new physical processes will be discovered: whatever their nature, they will be observed in proton 
collisions, and thus pose new challenges to our understanding of the nucleon. It may turn out that 
an interplay with new machines, such as an electron-proton LHeC collider [109 , or perhaps even a 
neutrino factory [32] may be necessary in order to exploit fully the LHC potential. A review of this 
subject written ten years from now is likely to be quite different from the present one, and possibly 
much more interesting. 
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